PrioritizedReplayBuffer

class cpprb.PrioritizedReplayBuffer(size, env_dict=None, alpha=0.6, *, Nstep=None, eps=0.0001, check_for_update=False, **kwargs)

Bases: cpprb.PyReplayBuffer.ReplayBuffer

Prioritized replay buffer class to store transitions with priorities.

In this class, these transitions are sampled with corresponding to priorities.

Methods Summary

add(self, *[, priorities])

Add transition(s) into replay buffer.

clear(self)

Clear replay buffer

get_max_priority(self)

Get the max priority of stored priorities

on_episode_end(self)

Call on episode end

sample(self, batch_size[, beta])

Sample the stored transitions.

update_priorities(self, indexes, priorities)

Update priorities

Methods Documentation

add(self, *, priorities=None, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously.

Parameters
  • priorities (array like or float, optional) – Priorities of each environment. When no priorities are passed, the maximum priorities until then are used.

  • **kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer

get_max_priority(self)float

Get the max priority of stored priorities

Returns

max_priority – the max priority of stored priorities

Return type

float

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size, beta=0.4)

Sample the stored transitions.

Transisions are sampled depending on correspoinding priorities with speciped size

Parameters
  • batch_size (int) – Sampled batch size

  • beta (float, optional) – The exponent of weight for relaxation of importance sampling effect, whose default value is 0.4

Returns

sample – Batch size of samples which also includes ‘weights’ and ‘indexes’

Return type

dict of ndarray

Notes

When ‘beta’ is 0, weights become uniform. Wen ‘beta’ is 1, weight becomes usual importance sampling. The ‘weights’ are also normalized by the weight for minimum priority (\(= w_{i}/\max_{j}(w_{j})\)), which ensure the weights \(\leq\) 1.

update_priorities(self, indexes, priorities)

Update priorities

Update priorities specified with indicies. If this PrioritizedReplayBuffer is constructed with check_for_update=True, then ignore indices which updated values after the last calling of sample() method.

Parameters
  • indexes (array_like) – indexes to update priorities

  • priorities (array_like) – priorities to update

Raises

TypeError – When indexes or priorities are None:

__init__()

Initialize PrioritizedReplayBuffer

Parameters
  • size (int) – buffer size

  • env_dict (dict of dict, optional) – dictionary specifying environments. The keies of env_dict become environment names. The values of env_dict, which are also dict, defines “shape” (default 1) and “dtypes” (fallback to default_dtype)

  • alpha (float, optional) – \(\alpha\) the exponent of the priorities in stored whose default value is 0.6

  • eps (float, optional) – \(\epsilon\) small positive constant to ensure error-less state will be sampled, whose default value is 1e-4.

  • check_for_update (bool) – If the value is True (default value is False), this buffer traces updated indices after the last calling of sample() method to avoid mis-updating priorities of already overwritten values. This feature is designed for multiprocess learning.

See also

ReplayBuffer

Any optional parameters at ReplayBuffer are valid, too.

Notes

The minimum and summation over certain ranges of pre-calculated priorities \((p_{i} + \epsilon )^{ \alpha }\) are stored with segment tree, which enable fast sampling.

_encode_sample(self, idx)
_load_transitions_v1(self, data)
add(self, *, priorities=None, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously.

Parameters
  • priorities (array like or float, optional) – Priorities of each environment. When no priorities are passed, the maximum priorities until then are used.

  • **kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer

get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters

shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.

Returns

transitions – All transitions stored in this replay buffer.

Return type

dict of numpy.ndarray

get_buffer_size(self)size_t

Get buffer size

Returns

buffer size

Return type

size_t

get_current_episode_len(self)size_t

Get current episode length

Returns

episode_len

Return type

size_t

get_max_priority(self)float

Get the max priority of stored priorities

Returns

max_priority – the max priority of stored priorities

Return type

float

get_next_index(self)size_t

Get the next index to store

Returns

the next index to store

Return type

size_t

get_stored_size(self)size_t

Get stored size

Returns

stored size

Return type

size_t

is_Nstep(self)bool

Get whether use Nstep or not

Returns

use_nstep

Return type

bool

load_transitions(self, file)

Load transitions from file

Parameters

file (str or file-like object) – File to read data

:raises ValueError : When file format is wrong.:

Warning

In order to avoid security vulnerability, you MUST NOT load untrusted file, since this method is based on pickle through joblib.load.

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size, beta=0.4)

Sample the stored transitions.

Transisions are sampled depending on correspoinding priorities with speciped size

Parameters
  • batch_size (int) – Sampled batch size

  • beta (float, optional) – The exponent of weight for relaxation of importance sampling effect, whose default value is 0.4

Returns

sample – Batch size of samples which also includes ‘weights’ and ‘indexes’

Return type

dict of ndarray

Notes

When ‘beta’ is 0, weights become uniform. Wen ‘beta’ is 1, weight becomes usual importance sampling. The ‘weights’ are also normalized by the weight for minimum priority (\(= w_{i}/\max_{j}(w_{j})\)), which ensure the weights \(\leq\) 1.

save_transitions(self, file, *, safe=True)

Save transitions to file

Parameters
  • file (str or file-like object) – File to write data

  • safe (bool, optional) – If False, we try more aggressive compression which might encounter future incompatibility

update_priorities(self, indexes, priorities)

Update priorities

Update priorities specified with indicies. If this PrioritizedReplayBuffer is constructed with check_for_update=True, then ignore indices which updated values after the last calling of sample() method.

Parameters
  • indexes (array_like) – indexes to update priorities

  • priorities (array_like) – priorities to update

Raises

TypeError – When indexes or priorities are None: